{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "domrnonINuu4"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/assignments/assignment_yourname_class5.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Important Note\n",
    "\n",
    "This assignment is from the older (Keras-based) version of this course and is no longer used for my class. You can find the current asignments here: [updated assignments](https://github.com/jeffheaton/app_deep_learning/tree/main/assignments)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "hTAVwaaOFuEf"
   },
   "source": [
    "# T81-558: Applications of Deep Neural Networks\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), School of Engineering and Applied Science, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).\n",
    "\n",
    "**Module 5 Assignment: Predicting Home Prices**\n",
    "\n",
    "**Student Name: Your Name**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "4YUdI4CcFuEg"
   },
   "source": [
    "# Assignment Instructions\n",
    "\n",
    "This assignment asks you to train a neural network to predict housing prices.  I provide you with two different datasets.  You will use one data set, which includes house prices for training.  A second dataset, which does not include house prices, will be used for prediction and be submitted for evaluation.  I also give you a third dataset that contains median income for zip codes that must be joined to both the training/test datasets to provide an additional input value.  You must use the median income with your inputs for extra predictive power.\n",
    "\n",
    "You can find all of the needed CSV files here:\n",
    "\n",
    "* [House Prices - Training](https://data.heatonresearch.com/data/t81-558/datasets/houses_train.csv)\n",
    "* [House Prices - Submit](https://data.heatonresearch.com/data/t81-558/datasets/houses_test.csv)\n",
    "* [Median Income by Zipcode](https://data.heatonresearch.com/data/t81-558/datasets/zips.csv)\n",
    "\n",
    "The median income by zipcode provides an additional feature, median income, that you should use in your predictions.  To complete this assignment perform the following steps:\n",
    "\n",
    "* Load the housing prices training data.\n",
    "* Join the median income by zipcode to the training data so that you gain the median income.\n",
    "* Train a model to predict house price when given the following inputs: 'bedrooms', 'bathrooms', 'garage', 'land', 'sqft', 'median_income'.\n",
    "* Load the housing prices test data.  This data does not contain the house price, you must predict this.\n",
    "* Join the median income by zipcode to the test/submit data to gain the median income.\n",
    "* Predict prices for the test/submit data.  \n",
    "* Create a submission dataset that contains the house id (from the test/submit data) and the predicted price for that house.  Include no other fields.\n",
    "* Submit this dataset and see how close you are to the actual values.\n",
    "\n",
    "The RMSE should be less than 10K, which means that you are predicting within +/- $10,000 the actual price is sufficient to complete the assignment.  You may also wish to see if you can get your prediction even more accurate.\n",
    "\n",
    "A few notes/suggestions on this assignment:\n",
    "\n",
    "* You will see really high loss rates, due to the face that the target value is the price of a house (which is a large number)\n",
    "* You might want to add a RMSE metric, which is the error in the same units target value. \n",
    "* I was able to obtain an RMSE of 86; yet still had a fairly high loss of 25920990.\n",
    "* To add RMSE as a metric:\n",
    "```\n",
    "model.compile(loss='mean_squared_error', metrics=[tf.keras.metrics.RootMeanSquaredError()], optimizer=opt)\n",
    "```\n",
    "# Google CoLab Instructions\n",
    "If you are using Google CoLab, it will be necessary to mount your GDrive to send your notebook during the submit process. Running the following code will map your GDrive to /content/drive."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 51
    },
    "colab_type": "code",
    "id": "9ZvFFOR5A-Wo",
    "outputId": "75ebcf96-d9ad-4da8-8de5-b6bce63100d0"
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    from google.colab import drive\n",
    "    drive.mount('/content/drive', force_remount=True)\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "4qrsT_KZFuEh"
   },
   "source": [
    "# Assignment Submit Function\n",
    "\n",
    "You will submit the 10 programming assignments electronically.  The following submit function can be used to do this.  My server will perform a basic check of each assignment and let you know if it sees any basic problems. \n",
    "\n",
    "**It is unlikely that should need to modify this function.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "43KOAL0OFuEi"
   },
   "outputs": [],
   "source": [
    "import base64\n",
    "import os\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import requests\n",
    "import PIL\n",
    "import PIL.Image\n",
    "import io\n",
    "\n",
    "# This function submits an assignment.  You can submit an assignment as much as you like, only the final\n",
    "# submission counts.  The paramaters are as follows:\n",
    "# data - List of pandas dataframes or images.\n",
    "# key - Your student key that was emailed to you.\n",
    "# no - The assignment class number, should be 1 through 1.\n",
    "# source_file - The full path to your Python or IPYNB file.  This must have \"_class1\" as part of its name.  \n",
    "# .             The number must match your assignment number.  For example \"_class2\" for class assignment #2.\n",
    "def submit(data,key,no,source_file=None):\n",
    "    raise Exception(\"If you are a current student, you are using an old version of this assignment. Refer to: https://github.com/jeffheaton/app_deep_learning/tree/main/assignments\")\n",
    "    if source_file is None and '__file__' not in globals(): raise Exception('Must specify a filename when a Jupyter notebook.')\n",
    "    if source_file is None: source_file = __file__\n",
    "    suffix = '_class{}'.format(no)\n",
    "    if suffix not in source_file: raise Exception('{} must be part of the filename.'.format(suffix))\n",
    "    with open(source_file, \"rb\") as image_file:\n",
    "        encoded_python = base64.b64encode(image_file.read()).decode('ascii')\n",
    "    ext = os.path.splitext(source_file)[-1].lower()\n",
    "    if ext not in ['.ipynb','.py']: raise Exception(\"Source file is {} must be .py or .ipynb\".format(ext))\n",
    "    payload = []\n",
    "    for item in data:\n",
    "        if type(item) is PIL.Image.Image:\n",
    "            buffered = BytesIO()\n",
    "            item.save(buffered, format=\"PNG\")\n",
    "            payload.append({'PNG':base64.b64encode(buffered.getvalue()).decode('ascii')})\n",
    "        elif type(item) is pd.core.frame.DataFrame:\n",
    "            payload.append({'CSV':base64.b64encode(item.to_csv(index=False).encode('ascii')).decode(\"ascii\")})\n",
    "    r= requests.post(\"https://api.heatonresearch.com/assignment-submit\",\n",
    "        headers={'x-api-key':key}, json={ 'payload': payload,'assignment': no, 'ext':ext, 'py':encoded_python})\n",
    "    if r.status_code==200:\n",
    "        print(\"Success: {}\".format(r.text))\n",
    "    else: print(\"Failure: {}\".format(r.text))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "collapsed": true,
    "id": "zd5fX98YFuEm",
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "source": [
    "# Assignment #5 Sample Code\n",
    "\n",
    "The following code provides a starting point for this assignment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "lIB3MmKuFuEn",
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "from scipy.stats import zscore\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense, Activation, Dropout\n",
    "import pandas as pd\n",
    "import io\n",
    "import requests\n",
    "import numpy as np\n",
    "from sklearn import metrics\n",
    "from sklearn.model_selection import KFold\n",
    "from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint\n",
    "from scipy.stats import zscore\n",
    "\n",
    "# This is your student key that I emailed to you at the beginnning of the semester.\n",
    "key = \"Gx5en9cEVvaZnjut6vfLm1HG4ZdddsI32sgldAXj\"  # This is an example key and will not work.\n",
    "\n",
    "# You must also identify your source file.  (modify for your local setup)\n",
    "file='/content/drive/My Drive/Colab Notebooks/assignment_yourname_class5.ipynb'  # Google CoLab\n",
    "# file='C:\\\\Users\\\\jeffh\\\\projects\\\\t81_558_deep_learning\\\\assignments\\\\assignment_yourname_class5.ipynb'  # Windows\n",
    "# file='/Users/jheaton/projects/t81_558_deep_learning/assignments/assignment_yourname_class5.ipynb'  # Mac/Linux\n",
    "\n",
    "# Begin assignment\n",
    "df_houses_train = pd.read_csv(\"https://data.heatonresearch.com/data/t81-558/datasets/houses_train.csv\")\n",
    "df_houses_submit = pd.read_csv(\"https://data.heatonresearch.com/data/t81-558/datasets/houses_test.csv\")\n",
    "df_zips = pd.read_csv(\"https://data.heatonresearch.com/data/t81-558/datasets/zips.csv\")\n",
    "\n",
    "\n",
    "submit(source_file=file,data=[df_submit],key=key,no=5)"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "colab": {
   "collapsed_sections": [],
   "name": "assignment_solution_class5.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3.10 (tensorflow)",
   "language": "python",
   "name": "tensorflow"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
